1
Introduction
Recently, we have witnessed a trend in deep learning in which models are rapidly increasing
in complexity [84, 211, 220, 90, 205, 286]. However, the host hardware where the models
are deployed has yet to keep up performance-wise due to practical limitations such as
latency, battery life, and temperature. It results in a large, ever-increasing gap between
computational demands and resources. To address this issue, network quantization [48,
199, 115, 149], which maps single-precision floating point weights or activations to lower
bits integers for compression and acceleration, has attracted considerable research attention.
The binary neural network (BNN) is the simplest version of low-bit networks and has gained
much attention due to its highly compressed parameters and activation features [48]. The
artificial intelligence company Xnor.ai is the most famous one focusing on BNNs. The
company, founded in 2016, raised a lot of money to build tools that help AI algorithms run
on devices rather than remote data centers. Apple Inc. bought the company and planned to
apply BNN technology on its devices to keep user information more private and speed-up
processing.
This chapter reviews recent advances in BNNs technologies well suited for front-end,
edge-based computing. We introduce and summarize existing works by classifying them
based on gradient approximation, quantization, architecture, loss functions, optimization
method, and binary neural architecture search. We also introduce computer vision and
speech recognition applications and discuss future applications of BNNs.
Deep learning has become increasingly important because of its superior performance.
Still, it suffers from a large memory footprint and high computational cost, making it dif-
ficult to deploy on front-end devices. For example, in unmanned systems, UAVs serve as
computing terminals with limited memory and computing resources, making it difficult
to perform real-time data processing based on convolutional neural networks (CNNs). To
improve storage and computation efficiency, BNNs have shown promise for practical ap-
plications. BNNs are neural networks where the weights are binarized. 1-bit CNNs are a
highly compressed version of BNNs that binarize both the weights and the activations to
decrease the model size and computational cost. These highly compressed models make
them suitable for front-end computing. In addition to these two, other quantizing neural
networks, such as pruning and sparse neural networks, are widely used in edge computing.
This chapter reviews the main advances of BNNs and 1-bit CNNs. Although binarization
operations can make neural networks more efficient, they almost always cause a significant
performance drop. In the last five years, many methods have been introduced to improve
the performance of BNNs. To better review these methods, we describe six aspects: gradient
approximation, quantization, structural design, loss design, optimization, and binary neural
architecture search. Finally, we will also review the object detection, object tracking, and
audio analysis applications of BNNs.
DOI: 10.1201/9781003376132-1
1